About This Project

In this project, I analyzed real online retail transaction datasets to efficiently manage a warehouse inventory.

Why Inventory Management?

You might wonder, “Why inventory management?” Well, have you ever been in a situation where you were so excited to buy specific goods at a store, only to find out that the store had run out of stock? Now, that would be very disappointing and your image of the store might not be as good as it was before. For a retail company, there are two potential downsides to such situation. One, the company loses a sale. And two, the company risks losing a valued customer. Therefore, it is important that the company has enough stock to meet customers’ needs. However, on the other hand, the company should not have too many items in stock because it can be unnecessary and expensive and things can go wrong with those items. This is why doing data analysis on inventory is necessary for effectively running a company. By conducting data analysis, companies can make data driven decisions on how many items they should prepare in stock.

So how are we going to conduct data analysis to make sure that the online retail company has the right amount of items in stock? We are going to use the company’s historical transaction data to figure out the most optimal amount of items to place in stock.

Get the Historical Data

First, we need to get the company’s historical transaction data from the company’s database. We are going to use 2 years worth of data from 2009 to 2011. Ideally, you would want to use dataset that is more relevant. But for this project, we will use this dataset from 2009 to 2011.

Raw data

Examine and Wrangle the Data

Before changing the data format and structure, we need to ask ourselves what tools we are going to use for the analysis, so that we can tailor the dataset specifically for those tools. Since the dataset exceeds 1 million rows, I decided to work mostly in SQL. Let’s wrangle.

I got rid of irrelevant columns of data for this analysis, such as “Invoice” number and “Customer ID.”

Then I transformed “InvoiceDate” column into two columns “Year” and “Month”, using Excel’s YEAR and MONTH functions.

Examining the dataset using filter function in Excel, I discovered that there were negative values in “Quantity” column.

Negative quantity values

This was peculiar, but soon I realized upon close inspection that the rows with negative quantity values were either entry errors or records of unsaleable items. Because these rows were irrelevant to the analysis and would obscure the result of the analysis if left unattended, I got rid of them.

Additionally, I replaced “£” sign with “Pound” by using Find & Replace function, so that data is more readable by softwares in general.

Finally, I decided to divide the dataset into two different datasets, one with “StockCode” and “Description”, and another with all the information but “Description.” I separated the dataset so that later I could create two tables and work more flexibly in SQL using JOIN function. The cleaned datasets looked like this:

Cleaned dataset


In order to analyze the data using SQL, I exported the cleaned datasets into csv files. Then, I imported the csv files into a database.

CREATE SCHEMA inventory_management;

CREATE TABLE inventory_management.transaction_data (
    StockCode varchar(50) NULL,
    Quantity integer NULL,
    Price real NULL,
    Country varchar(50) NULL,
    Year integer NULL,
    Month integer NULL
);

COPY transaction_data (
    StockCode,
    Quantity,
    Price,
    Country,
    Year,
    Month
)
FROM '/Users/bruce/Desktop/case_studies/retail/archive/2010_2011.csv'
DELIMITER ','
CSV HEADER; --repeat this process for 2010-2011 datset!

CREATE TABLE inventory_management.stockcode (
    StockCode VARCHAR(50) NULL,
    Description VARCHAR(100) NULL
);

COPY stockcode (
    StockCode,
    Description
)
FROM '/Users/bruce/Desktop/case_studies/retail/archive/stockcode.csv'
DELIMITER ','
CSV HEADER;

Analyze the Data

Now it’s time to analyze the data! The main purpose of data analytics in inventory management is to figure out how many items in stock is good amount to efficiently serve customers in the future. To do this, I created an inventory list from the past to get a good estimate of how many items to stock in the warehouse.

Let’s get an inventory list from 2010, segmenting it by country.

SELECT transaction_data.stockcode, stockcode.description, country, sum(quantity) AS quantity
FROM inventory_management.transaction_data
LEFT JOIN inventory_management.stockcode
ON transaction_data.stockcode = stockcode.stockcode
WHERE stockcode.description IS NOT NULL AND YEAR = 2010
GROUP BY transaction_data.stockcode, stockcode.description, country
ORDER BY stockcode, country;
Here is a snapshot of the list in Excel format. You can download the list here.


Now, let’s analyze a little deeper. We can take a look at one of the most popular items on the list, JAZZ HEARTS SPIRAL NOTEPAD and see how it is distributed by Month and Country.

SELECT transaction_data.stockcode, stockcode.description, country, sum(quantity) AS quantity, month
FROM inventory_management.transaction_data
LEFT JOIN inventory_management.stockcode
ON transaction_data.stockcode = stockcode.stockcode
WHERE stockcode.description = 'JAZZ HEARTS SPIRAL NOTEPAD' AND YEAR = 2010
GROUP BY transaction_data.stockcode, stockcode.description, country, month
ORDER BY month;
Displaying records 1 - 10
stockcode description country quantity month
84077 JAZZ HEARTS SPIRAL NOTEPAD Spain 96 1
84077 JAZZ HEARTS SPIRAL NOTEPAD United Kingdom 1347 1
84077 JAZZ HEARTS SPIRAL NOTEPAD EIRE 96 2
84077 JAZZ HEARTS SPIRAL NOTEPAD France 48 2
84077 JAZZ HEARTS SPIRAL NOTEPAD Sweden 288 2
84077 JAZZ HEARTS SPIRAL NOTEPAD United Kingdom 1968 2
84077 JAZZ HEARTS SPIRAL NOTEPAD France 48 3
84077 JAZZ HEARTS SPIRAL NOTEPAD United Kingdom 4515 3
84077 JAZZ HEARTS SPIRAL NOTEPAD Sweden 576 4
84077 JAZZ HEARTS SPIRAL NOTEPAD United Kingdom 3168 4

We can use Tableau’s heatmap to see how this product is distributed by Month and Country. The heatmap should give some ideas about how much to stock the item by month.

In conjunction to the heatmap, we can do some statistics using the table as the sample. We are going to use T-statistics and find values of 95% confidence interval. Here is the initial calculation:

A simplified version with Confidence Interval limits rounded:

Looking at the table, it would be a good idea to have the items stocked at least as many as the lower limit suggests and up to as many as the upper limit suggests for each month. However, as you can see in the heatmap, this suggestion often misses. Therefore, the best solution to managing the inventory successfully is to use both heatmap and the table.

With data analysis, we will able to satisfy the customers’ needs and run the company successfully!